Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illumos 6267 - dn_bonus evicted too early #3874

Closed
wants to merge 1 commit into from

Conversation

nedbass
Copy link
Contributor

@nedbass nedbass commented Oct 2, 2015

The bonus buffer associated with a dnode is expected to remain resident
until: the dnode is evicted via dnode_buf_pageout(), the dnode is
freed in dnode_sync_free(), or the objset containing the dnode is
evicted via dmu_objset_evict(). However, since bonus buffers (and DMU
buffers in general) can have draining references when these events occur,
dbuf_rele_and_unlock() has logic to ensure that once these late references
are released affected buffers will be evicted.

dbuf_rele_and_unlock() currently checks for a dbuf for an evicting
objset via the os->os_evicting flag, and detects buffers for a freed
dnode by testing dn->dn_type and dn->dn_free_txg fields. Unfortunately,
the free'd dnode test can fire prematurely - anytime after the dnode is
scheduled to be freed via dnode_free() until the free is committed in
dnode_sync_free(). If all references to the bonus buffer are dropped
within this window, the bonus buffer will be evicted and code in
dnode_sync() that relies on the bonus buffer will fail.

Additionally, the "free'd dnode test" isn't applied to normal buffers
(buffers that are not the bonus buffer) and there is no mechanism to
guarantee eviction in the dnode_buf_pageout() case (the dnode is not
being freed nor is the objset being evicted).

Replace the two existing deferred eviction mechanisms with a per-dbuf
flag, db_pending_evict. This is set when explicit eviction is requested
via either dnode_evict_dbufs() or dnode_evict_bonus(). These actions
only occur after it is safe for dnode buffers to be evicted (e.g. the
bonus buffer will not be referenced again).

uts/common/fs/zfs/sys/dbuf.h:
Add comments for boolean fields in dmu_buf_impl_t.

Add the db_pending_evict field.

uts/common/fs/zfs/sys/dbuf.h:
uts/common/fs/zfs/dbuf.c:
Rename db_immediate_evict to db_user_immediate_evict to avoid
confusion between dbuf user state eviction and deferred eviction
of a dbuf.

uts/common/fs/zfs/dbuf.c:
Consistently use TRUE/FALSE for boolean fields in
dmu_buf_impl_t.

Simplify pending eviction logic to use the new db_pending_evict
flag in all cases.

uts/common/fs/zfs/dmu_objset.c:
uts/common/fs/zfs/sys/dmu_objset.h:
Remove objset_t's os_evicting field. This same functionality
is now provided by db_pending_evict.

uts/common/fs/zfs/dnode_sync.c:
In dnode_evict_dbufs() and dnode_evict_bonus(), mark dbufs
with draining references (dbufs that can't be evicted inline)
as pending_evict.

In dnode_sync_free(), remove ASSERT() that a dnode being free'd
has no active dbufs. This is usually the case, but is not
guaranteed due to draining references. (e.g. The deadlist for a
deleted dataset may still be open if another thread referenced
the dataset just before it was freed and the dsl_dataset_t hasn't
been released or is still being evicted).

#3865
#3443
Ported-by: Ned Bass [email protected]

The bonus buffer associated with a dnode is expected to remain resident
until: the dnode is evicted via dnode_buf_pageout(), the dnode is
freed in dnode_sync_free(), or the objset containing the dnode is
evicted via dmu_objset_evict(). However, since bonus buffers (and DMU
buffers in general) can have draining references when these events occur,
dbuf_rele_and_unlock() has logic to ensure that once these late references
are released affected buffers will be evicted.

dbuf_rele_and_unlock() currently checks for a dbuf for an evicting
objset via the os->os_evicting flag, and detects buffers for a freed
dnode by testing dn->dn_type and dn->dn_free_txg fields. Unfortunately,
the free'd dnode test can fire prematurely - anytime after the dnode is
scheduled to be freed via dnode_free() until the free is committed in
dnode_sync_free(). If all references to the bonus buffer are dropped
within this window, the bonus buffer will be evicted and code in
dnode_sync() that relies on the bonus buffer will fail.

Additionally, the "free'd dnode test" isn't applied to normal buffers
(buffers that are not the bonus buffer) and there is no mechanism to
guarantee eviction in the dnode_buf_pageout() case (the dnode is not
being freed nor is the objset being evicted).

Replace the two existing deferred eviction mechanisms with a per-dbuf
flag, db_pending_evict. This is set when explicit eviction is requested
via either dnode_evict_dbufs() or dnode_evict_bonus(). These actions
only occur after it is safe for dnode buffers to be evicted (e.g. the
bonus buffer will not be referenced again).

uts/common/fs/zfs/sys/dbuf.h:
	Add comments for boolean fields in dmu_buf_impl_t.

	Add the db_pending_evict field.

uts/common/fs/zfs/sys/dbuf.h:
uts/common/fs/zfs/dbuf.c:
	Rename db_immediate_evict to db_user_immediate_evict to avoid
	confusion between dbuf user state eviction and deferred eviction
	of a dbuf.

uts/common/fs/zfs/dbuf.c:
	Consistently use TRUE/FALSE for boolean fields in
	dmu_buf_impl_t.

	Simplify pending eviction logic to use the new db_pending_evict
	flag in all cases.

uts/common/fs/zfs/dmu_objset.c:
uts/common/fs/zfs/sys/dmu_objset.h:
	Remove objset_t's os_evicting field. This same functionality
	is now provided by db_pending_evict.

uts/common/fs/zfs/dnode_sync.c:
	In dnode_evict_dbufs() and dnode_evict_bonus(), mark dbufs
	with draining references (dbufs that can't be evicted inline)
	as pending_evict.

	In dnode_sync_free(), remove ASSERT() that a dnode being free'd
	has no active dbufs. This is usually the case, but is not
	guaranteed due to draining references. (e.g. The deadlist for a
	deleted dataset may still be open if another thread referenced
	the dataset just before it was freed and the dsl_dataset_t hasn't
	been released or is still being evicted).

openzfs#3865
openzfs#3443
Ported-by: Ned Bass <[email protected]>
@ryao
Copy link
Contributor

ryao commented Oct 6, 2015

This change looks good to me. I have provided feedback on the review board:

https://reviews.csiden.org/r/245/

We were bitten by this issue at work:

https://clusterhq.atlassian.net/browse/ZFS-37

@behlendorf
Copy link
Contributor

Merged as:

bc4501f Illumos 6267 - dn_bonus evicted too early

@behlendorf behlendorf closed this Oct 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants